A Proposed Architecture for Continuous Web Monitoring Through Online Crawling of Blogs

نویسندگان

  • Mehdi Naghavi
  • Mohsen Sharifi
چکیده

Getting informed of what is registered in the Web space on time, can greatly help the psychologists, marketers and political analysts to familiarize, analyse, make decision and act correctly based on the society`s different needs. The great volume of information in the Web space hinders us to continuously online investigate the whole space of the Web. Focusing on the considered blogs limits our working domain and makes the online crawling in the Web space possible. In this article, an architecture is offered which continuously online crawls the related blogs, using focused crawler, and investigates and analyses the obtained data. The online fetching is done based on the latest announcements of the ping server machines. A weighted graph is formed based on targeting the important key phrases, so that a focused crawler can do the fetching of the complete texts of the related Web pages, based on the weighted graph.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image flip CAPTCHA

The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Incremental Web Crawling as a Competitive Game of Learning Automata

There is no doubt that the World Wide Web has lived up to it’s hype of being the world’s central information highway through the past years. An increasing amount of versatile services keeps finding their way onto the Web as information providers continue to embrace the possibilities that the Web can offer. Especially the possibility of producing dynamic content has been an accelerant factor and...

متن کامل

On Social Network Web Sites: Definition, Features, Architectures and Analysis Tools

Development and usage of online social networking web sites are growing rapidly. Millions members of these web sites publicly articulate mutual "friendship" relations and share user-created contents, such as photos, videos, files, and blogs. The advances in web designing technology and fast growing usage of online resources prompted web designers to improve features and architectures of social ...

متن کامل

On Social Network Web Sites: Definition, Features, Architectures and Analysis Tools

Development and usage of online social networking web sites are growing rapidly. Millions members of these web sites publicly articulate mutual "friendship" relations and share user-created contents, such as photos, videos, files, and blogs. The advances in web designing technology and fast growing usage of online resources prompted web designers to improve features and architectures of social ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1202.1837  شماره 

صفحات  -

تاریخ انتشار 2012